[Bugfix] LoRA for DeepSeek V3.2 by HollowMan6 · Pull Request #35077 · vllm-project/vllm

HollowMan6 · 2026-02-23T04:28:10Z

Purpose

This PR fixes LoRA regressions seen with DeepSeek V3.2/DSA:

LoRA module registration failed for fused_qkv_a_proj with an assertion that the module was not a BaseLayerWithLoRA.
After that fix, MLA weight post-processing failed with AttributeError: 'ColumnParallelLinearWithLoRA' object has no attribute 'quant_method'.

   File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
     return self.lora_manager.create_lora_manager(model, vllm_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in create_lora_manager
     lora_manager = create_lora_manager(
                    ^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 895, in create_lora_manager
     lora_manager = lora_manager_cls(
                    ^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 807, in __init__
     super().__init__(
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 111, in __init__
     self._create_lora_modules()
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 407, in _create_lora_modules
     self.register_module(module_name, new_module)
   File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 414, in register_module
     assert isinstance(module, BaseLayerWithLoRA), (
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 AssertionError: Module model.layers.0.self_attn.fused_qkv_a_proj must be a BaseLayerWithLoRA instance, got <class 'vllm.model_executor.models.deepseek_v2.DeepSeekV2FusedQkvAProj'>

File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
     output = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
   File "/mnt/data/user/songlin/verl/verl/workers/rollout/vllm_rollout/utils.py", line 273, in update_weights_from_ipc
     process_weights_after_loading(model, model_config, self.device)
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 117, in process_weights_after_loading
     module.process_weights_after_loading(model_config.dtype)
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/layers/attention/mla_attention.py", line 655, in process_weights_after_loading
     kv_b_proj_weight = get_and_maybe_dequant_weights(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/quant_utils.py", line 333, in get_and_maybe_dequant_weights
     if layer.quant_method is None or isinstance(
        ^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1965, in __getattr__
     raise AttributeError(
 AttributeError: 'ColumnParallelLinearWithLoRA' object has no attribute 'quant_method'

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 771, in worker_main
    worker = WorkerProc(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 597, in __init__
    self.worker.load_model()
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 336, in load_model
    self.model_runner.load_model(load_dummy_weights=dummy_weights)
  File "/usr/local/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4222, in load_model
    self.model = self.load_lora_model(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 46, in load_lora_model
    return self.lora_manager.create_lora_manager(model, vllm_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in create_lora_manager
    lora_manager = create_lora_manager(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 895, in create_lora_manager
    lora_manager = lora_manager_cls(
                   ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 807, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 111, in __init__
    self._create_lora_modules()
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 407, in _create_lora_modules
    self.register_module(module_name, new_module)
  File "/usr/local/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 414, in
    assert isinstance(module, BaseLayerWithLoRA), (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Module model.layers.3.mlp.gate must be a BaseLayerWithLoRA instance, got <class 'vllm.model_executor.layers.fused_moe.router.gate_linear.GateLinear'>

Test Plan

Added the unit test cases, and also with end to end test manually.

Test Result

All pass without the above error.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

_{✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.}

dosubot · 2026-02-23T04:28:19Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request addresses two LoRA regressions related to DeepSeek V3.2/DSA models. The changes primarily involve modifying type checks from type(obj) is Class to isinstance(obj, Class) to correctly handle subclasses, and introducing a mechanism to unwrap LoRA linear wrappers before accessing quantization metadata. The added test cases validate these fixes, ensuring that LoRA modules are registered correctly and weight post-processing functions can access quant_method attributes as expected. The changes are well-targeted and directly resolve the reported issues, improving the robustness of LoRA integration with various model architectures.

Copilot

Pull request overview

This PR fixes two LoRA regressions encountered with DeepSeek V3.2/DSA that prevented LoRA adapters from being applied to the custom DeepSeekV2FusedQkvAProj layer, which is a subclass of MergedColumnParallelLinear.

Changes:

Modified LoRA layer replacement logic to support subclasses of MergedColumnParallelLinear by changing type() is checks to isinstance() checks
Added unwrapping logic in get_and_maybe_dequant_weights() to handle LoRA wrappers transparently by accessing the underlying base_layer
Added comprehensive test coverage for both fixes

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
vllm/model_executor/layers/quantization/utils/quant_utils.py	Adds automatic unwrapping of LoRA wrappers in `get_and_maybe_dequant_weights()` to access the base layer's quantization metadata
vllm/lora/layers/column_parallel_linear.py	Changes type checks from `type() is` to `isinstance()` for `MergedColumnParallelLinear` to support custom subclasses like `DeepSeekV2FusedQkvAProj`
tests/lora/test_layers.py	Adds test cases for subclassed `MergedColumnParallelLinear` layer replacement and for `get_and_maybe_dequant_weights()` with LoRA wrappers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mergify · 2026-03-17T11:42:27Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @HollowMan6.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-03-17T12:58:45Z

Hi @HollowMan6, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Hollow Man <hollowman@opensuse.org>

jeejeelee

Overall LGTM except the final comment.

jeejeelee · 2026-04-20T01:52:10Z

@@ -202,6 +204,12 @@ def all_gather(self, input_: torch.Tensor, dim: int = -1) -> torch.Tensor:
            + (self.world_size * input_size[dim],)
            + input_size[dim + 1 :]
        )
+        # When the gathered dimension has size 1, torch.compile can preserve a


I think we should move these changes to lora/

Thank you, I just found that this change is actually not necessary after the other fix, so I removed the related changes here.

Signed-off-by: Hollow Man <hollowman@opensuse.org>

jeejeelee

thank you

mergify · 2026-04-20T12:12:38Z

Hi @HollowMan6, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Hollow Man <hollowman@opensuse.org>

yewentao256 · 2026-04-22T18:26:19Z

        finally:
-            # Note: for some reason DeepEP buffers don't seem to be
-            # entirely reusable on B200. In order to work around this
-            # we clear the all2all manager's cache after each testpoint.
-            cap = current_platform.get_device_capability()
-            if (
-                cap is not None
-                and cap.major == 10
-                and (
-                    test_config.backend == "deepep_low_latency"
-                    or test_config.backend == "deepep_high_throughput"
-                )
-            ):
+            # DeepEP managers are not reliably reusable across many subtests in
+            # a single worker process. Tear them down after each DeepEP case so
+            # later subtests do not inherit stale communication state.
+            if test_config.backend in {
+                "deepep_low_latency",
+                "deepep_high_throughput",
+            }:


I think this breaks the CI #40637

Hmm I don't think so as all the CIs were passed before merge, including the specific one you mentioned: https://buildkite.com/vllm/ci/builds/62466/steps/canvas?sid=019db40c-c48c-4238-b1ca-827533eb7d09&tab=output

Also d22887b was introduced specifically for fixing that CI.

@yewentao256 The error will appear actually before d22887b was landed, maybe you forgot to update your local branch? https://buildkite.com/vllm/ci/builds/62124/steps/canvas?jid=019daad3-f858-4b61-b22e-765a9c78a2e4&tab=output#019daad3-f858-4b61-b22e-765a9c78a2e4

Ohh make sense, the nightly build happens at 2am

Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Yifan <yzong@redhat.com>

Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Adrian <info@zzit.ch>

Copilot AI review requested due to automatic review settings February 23, 2026 04:28

HollowMan6 requested review from jeejeelee, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners February 23, 2026 04:28

Copilot started reviewing on behalf of HollowMan6 February 23, 2026 04:28 View session

mergify Bot added deepseek Related to DeepSeek models bug Something isn't working labels Feb 23, 2026

gemini-code-assist Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread tests/lora/test_layers.py Outdated

Comment thread tests/lora/test_layers.py Outdated

Comment thread vllm/model_executor/layers/quantization/utils/quant_utils.py

Copilot AI reviewed Feb 23, 2026

View reviewed changes

HollowMan6 force-pushed the fused_qkv_a_proj branch 4 times, most recently from 70ae7a3 to 0b1d296 Compare March 3, 2026 21:17

HollowMan6 force-pushed the fused_qkv_a_proj branch 4 times, most recently from 37a5cc3 to 8430f84 Compare March 14, 2026 14:05

HollowMan6 force-pushed the fused_qkv_a_proj branch 2 times, most recently from 0e2c03e to fe5f7d2 Compare March 16, 2026 20:17

mergify Bot added the needs-rebase label Mar 17, 2026

HollowMan6 force-pushed the fused_qkv_a_proj branch from fe5f7d2 to 283c7e2 Compare March 17, 2026 11:50

mergify Bot removed the needs-rebase label Mar 17, 2026

HollowMan6 force-pushed the fused_qkv_a_proj branch 2 times, most recently from dd3c039 to 162299b Compare March 18, 2026 15:19

HollowMan6 force-pushed the fused_qkv_a_proj branch from a5e6ebb to dd5e7fd Compare April 19, 2026 18:27

Fix fully_sharded_loras with TP=32 and lora rank=32

1121c0b

Signed-off-by: Hollow Man <hollowman@opensuse.org>

jeejeelee reviewed Apr 20, 2026

View reviewed changes

Remove over defensive base device communicator

29fb359

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 requested a review from jeejeelee April 20, 2026 06:44

jeejeelee approved these changes Apr 20, 2026

View reviewed changes

jeejeelee enabled auto-merge (squash) April 20, 2026 07:39

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 20, 2026

mergify Bot and others added 2 commits April 20, 2026 08:07

Merge branch 'main' into fused_qkv_a_proj

239f8d2

Merge branch 'main' into fused_qkv_a_proj

6283420

auto-merge was automatically disabled April 20, 2026 12:07
Head branch was pushed to by a user without write access

fix triton backend selection test case

a9a8629

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 force-pushed the fused_qkv_a_proj branch from 19863b9 to a9a8629 Compare April 20, 2026 12:17

HollowMan6 and others added 3 commits April 20, 2026 16:18

Relax workaround more than B100

d22887b

Signed-off-by: Hollow Man <hollowman@opensuse.org>

Merge branch 'main' into fused_qkv_a_proj

c93ccb4

Merge branch 'main' into fused_qkv_a_proj

4884814

jeejeelee merged commit a250f1b into vllm-project:main Apr 22, 2026
81 checks passed

HollowMan6 deleted the fused_qkv_a_proj branch April 22, 2026 11:35

yewentao256 reviewed Apr 22, 2026

View reviewed changes

yewentao256 mentioned this pull request Apr 22, 2026

[CI Bug] Fix ci issue #40637, Kernels FusedMoE Layer Test (2 H100s): test_moe_layer.py::test_moe_layer #40639

Closed

HollowMan6 mentioned this pull request Apr 22, 2026

[CI Failure]: Kernels FusedMoE Layer Test (2 H100s): test_moe_layer.py::test_moe_layer #40637

Closed

3 tasks

ZhanqiuHu mentioned this pull request Apr 25, 2026

[CI Investigate 2026-04-25] LoRA TP: model generates garbage text, LoRA adapter modules silently ignored ZhanqiuHu/vllm-ci-watch#40

Open

HollowMan6 mentioned this pull request Apr 29, 2026

[LoRA] Initial EP support for LoRA #40867

Draft

4 tasks

pawel-olejniczak mentioned this pull request Apr 29, 2026

[FIX_FOR_VLLM_CUSTOM=5b39b268f506150dbab38f6f6c04b7c843e37c07] Fix upstream regressions: MoE refactor, DeepSeek V4 router, KV offload HMA vllm-project/vllm-gaudi#1403

Open

gcanlin mentioned this pull request May 2, 2026

[Misc][Main2Main] Upgrade vLLM to 0429(DSV4/v0.20.0) vllm-project/vllm-ascend#8856

Open

Uh oh!

Conversation

HollowMan6 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

dosubot Bot commented Feb 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mergify Bot commented Mar 17, 2026

Uh oh!

mergify Bot commented Mar 17, 2026

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

HollowMan6 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 20, 2026

Uh oh!

Uh oh!

yewentao256 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

HollowMan6 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

HollowMan6 Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yewentao256 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HollowMan6 commented Feb 23, 2026 •

edited

Loading

HollowMan6 Apr 22, 2026 •

edited

Loading